Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
mSphere ; 9(4): e0079923, 2024 Apr 23.
Article in English | MEDLINE | ID: mdl-38501831

ABSTRACT

BK polyomavirus (BKPyV) is a double-stranded DNA virus causing nephropathy, hemorrhagic cystitis, and urothelial cancer in transplant patients. The BKPyV-encoded capsid protein Vp1 and large T-antigen (LTag) are key targets of neutralizing antibodies and cytotoxic T-cells, respectively. Our single-center data suggested that variability in Vp1 and LTag may contribute to failing BKPyV-specific immune control and impact vaccine design. We, therefore, analyzed all available entries in GenBank (1516 VP1; 742 LTAG) and explored potential structural effects using computational approaches. BKPyV-genotype (gt)1 was found in 71.18% of entries, followed by BKPyV-gt4 (19.26%), BKPyV-gt2 (8.11%), and BKPyV-gt3 (1.45%), but rates differed according to country and specimen type. Vp1-mutations matched a serotype different than the assigned one or were serotype-independent in 43%, 18% affected more than one amino acid. Notable Vp1-mutations altered antibody-binding domains, interactions with sialic acid receptors, or were predicted to change conformation. LTag-sequences were more conserved, with only 16 mutations detectable in more than one entry and without significant effects on LTag-structure or interaction domains. However, LTag changes were predicted to affect HLA-class I presentation of immunodominant 9mers to cytotoxic T-cells. These global data strengthen single center observations and specifically our earlier findings revealing mutant 9mer epitopes conferring immune escape from HLA-I cytotoxic T cells. We conclude that variability of BKPyV-Vp1 and LTag may have important implications for diagnostic assays assessing BKPyV-specific immune control and for vaccine design. IMPORTANCE: Type and rate of amino acid variations in BKPyV may provide important insights into BKPyV diversity in human populations and an important step toward defining determinants of BKPyV-specific immunity needed to protect vulnerable patients from BKPyV diseases. Our analysis of BKPyV sequences obtained from human specimens reveals an unexpectedly high genetic variability for this double-stranded DNA virus that strongly relies on host cell DNA replication machinery with its proof reading and error correction mechanisms. BKPyV variability and immune escape should be taken into account when designing further approaches to antivirals, monoclonal antibodies, and vaccines for patients at risk of BKPyV diseases.

2.
Bioinformatics ; 40(1)2024 01 02.
Article in English | MEDLINE | ID: mdl-38175775

ABSTRACT

MOTIVATION: Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high-dimensional embeddings on a per-residue level and encode a "semantic meaning" of each individual amino acid in the context of the full protein sequence. These representations have been used as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. RESULTS: In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA) and show how these capture structural similarities even in the twilight zone, outperforming both classical methods as well as other approaches based on pLMs. The method shows excellent accuracy despite the absence of training and parameter optimization. We demonstrate that the combination of pLMs with alignment methods is a valuable approach for the detection of relationships between proteins in the twilight-zone. AVAILABILITY AND IMPLEMENTATION: The code to run EBA and reproduce the analysis described in this article is available at: https://git.scicore.unibas.ch/schwede/EBA and https://git.scicore.unibas.ch/schwede/eba_benchmark.


Subject(s)
Amino Acids , Proteins , Proteins/chemistry , Amino Acid Sequence , Sequence Alignment , Language
3.
Proteins ; 91(12): 1912-1924, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37885318

ABSTRACT

The prediction of protein-ligand complexes (PLC), using both experimental and predicted structures, is an active and important area of research, underscored by the inclusion of the Protein-Ligand Interaction category in the latest round of the Critical Assessment of Protein Structure Prediction experiment CASP15. The prediction task in CASP15 consisted of predicting both the three-dimensional structure of the receptor protein as well as the position and conformation of the ligand. This paper addresses the challenges and proposed solutions for devising automated benchmarking techniques for PLC prediction. The reliability of experimentally solved PLC as ground truth reference structures is assessed using various validation criteria. Similarity of PLC to previously released complexes are employed to judge PLC diversity and the difficulty of a PLC as a prediction target. We show that the commonly used PDBBind time-split test-set is inappropriate for comprehensive PLC evaluation, with state-of-the-art tools showing conflicting results on a more representative and high quality dataset constructed for benchmarking purposes. We also show that redocking on crystal structures is a much simpler task than docking into predicted protein models, demonstrated by the two PLC-prediction-specific scoring metrics created. Finally, we introduce a fully automated pipeline that predicts PLC and evaluates the accuracy of the protein structure, ligand pose, and protein-ligand interactions.


Subject(s)
Benchmarking , Proteins , Binding Sites , Protein Binding , Ligands , Reproducibility of Results , Molecular Docking Simulation , Proteins/chemistry , Protein Conformation
4.
Proteins ; 91(12): 1811-1821, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37795762

ABSTRACT

CASP15 introduced a new category, ligand prediction, where participants were provided with a protein or nucleic acid sequence, SMILES line notation, and stoichiometry for ligands and tasked with generating computational models for the three-dimensional structure of the corresponding protein-ligand complex. These models were subsequently compared with experimental structures determined by x-ray crystallography or cryoEM. To assess these predictions, two novel scores were developed. The Binding-Site Superposed, Symmetry-Corrected Pose Root Mean Square Deviation (BiSyRMSD) evaluated the absolute deviations of the models from the experimental structures. At the same time, the Local Distance Difference Test for Protein-Ligand Interactions (lDDT-PLI) assessed the ability of models to reproduce the protein-ligand interactions in the experimental structures. The ligands evaluated in this challenge range from single-atom ions to large flexible organic molecules. More than 1800 submissions were evaluated for their ability to predict 23 different protein-ligand complexes. Overall, the best models could faithfully reproduce the geometries of more than half of the prediction targets. The ligands' size and flexibility were the primary factors influencing the predictions' quality. Small ions and organic molecules with limited flexibility were predicted with high fidelity, while reproducing the binding poses of larger, flexible ligands proved more challenging.


Subject(s)
Models, Molecular , Humans , Ligands , Binding Sites , Ions , Protein Binding , Crystallography, X-Ray
5.
Nature ; 622(7983): 646-653, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37704037

ABSTRACT

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the ß-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.


Subject(s)
Databases, Protein , Deep Learning , Molecular Sequence Annotation , Protein Folding , Proteins , Structural Homology, Protein , Amino Acid Sequence , Internet , Proteins/chemistry , Proteins/classification , Proteins/metabolism
6.
Nat Rev Drug Discov ; 22(11): 895-916, 2023 11.
Article in English | MEDLINE | ID: mdl-37697042

ABSTRACT

Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.


Subject(s)
Artificial Intelligence , Biological Products , Humans , Algorithms , Machine Learning , Drug Discovery , Drug Design , Biological Products/pharmacology
7.
Proteins ; 91(12): 1571-1599, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37493353

ABSTRACT

We present an in-depth analysis of selected CASP15 targets, focusing on their biological and functional significance. The authors of the structures identify and discuss key protein features and evaluate how effectively these aspects were captured in the submitted predictions. While the overall ability to predict three-dimensional protein structures continues to impress, reproducing uncommon features not previously observed in experimental structures is still a challenge. Furthermore, instances with conformational flexibility and large multimeric complexes highlight the need for novel scoring strategies to better emphasize biologically relevant structural regions. Looking ahead, closer integration of computational and experimental techniques will play a key role in determining the next challenges to be unraveled in the field of structural molecular biology.


Subject(s)
Computational Biology , Proteins , Protein Conformation , Models, Molecular , Computational Biology/methods , Proteins/chemistry
8.
Proteins ; 91(12): 1550-1557, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37306011

ABSTRACT

Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.


Subject(s)
Computational Biology , Proteins , Protein Conformation , Proteins/chemistry , Models, Molecular , Ligands
9.
Comput Struct Biotechnol J ; 21: 630-643, 2023.
Article in English | MEDLINE | ID: mdl-36659927

ABSTRACT

Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.

10.
Nat Struct Mol Biol ; 29(11): 1056-1067, 2022 11.
Article in English | MEDLINE | ID: mdl-36344848

ABSTRACT

Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.


Subject(s)
Computational Biology , Furylfuramide , Computational Biology/methods , Binding Sites , Proteins/chemistry , Databases, Protein , Protein Conformation
11.
New Phytol ; 235(5): 1884-1899, 2022 09.
Article in English | MEDLINE | ID: mdl-35612785

ABSTRACT

Strigolactones (SLs) are rhizosphere signalling molecules and phytohormones. The biosynthetic pathway of SLs in tomato has been partially elucidated, but the structural diversity in tomato SLs predicts that additional biosynthetic steps are required. Here, root RNA-seq data and co-expression analysis were used for SL biosynthetic gene discovery. This strategy resulted in a candidate gene list containing several cytochrome P450s. Heterologous expression in Nicotiana benthamiana and yeast showed that one of these, CYP712G1, can catalyse the double oxidation of orobanchol, resulting in the formation of three didehydro-orobanchol (DDH) isomers. Virus-induced gene silencing and heterologous expression in yeast showed that one of these DDH isomers is converted to solanacol, one of the most abundant SLs in tomato root exudate. Protein modelling and substrate docking analysis suggest that hydroxy-orbanchol is the likely intermediate in the conversion from orobanchol to the DDH isomers. Phylogenetic analysis demonstrated the occurrence of CYP712G1 homologues in the Eudicots only, which fits with the reports on DDH isomers in that clade. Protein modelling and orobanchol docking of the putative tobacco CYP712G1 homologue suggest that it can convert orobanchol to similar DDH isomers as tomato.


Subject(s)
Solanum lycopersicum , Catalysis , Cytochrome P-450 Enzyme System/genetics , Cytochrome P-450 Enzyme System/metabolism , Heterocyclic Compounds, 3-Ring , Lactones/metabolism , Solanum lycopersicum/genetics , Solanum lycopersicum/metabolism , Phylogeny , Plant Growth Regulators/metabolism , Rhizosphere , Saccharomyces cerevisiae/metabolism , Nicotiana/genetics , Nicotiana/metabolism
12.
PLoS Comput Biol ; 17(3): e1008197, 2021 03.
Article in English | MEDLINE | ID: mdl-33750949

ABSTRACT

Sesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species. This calls for more sophisticated computational methods for functionality prediction. We investigate the specificity of precursor cation formation in these elusive enzymes. By inspecting multi-product STSs, we demonstrate that STSs have a strong selectivity towards one precursor cation. We use a machine learning approach combining sequence and structure information to accurately predict precursor cation specificity for STSs across all plant species. We combine this with a co-evolutionary analysis on the wealth of uncharacterized putative STS sequences, to pinpoint residues and distant functional contacts influencing cation formation and reaction pathway selection. These structural factors can be used to predict and engineer enzymes with specific functions, as we demonstrate by predicting and characterizing two novel STSs from Citrus bergamia.


Subject(s)
Alkyl and Aryl Transferases/metabolism , Evolution, Molecular , Machine Learning , Plants/enzymology , Sesquiterpenes/metabolism , Alkyl and Aryl Transferases/chemistry , Amino Acid Sequence , Cations , Protein Conformation , Sequence Homology, Amino Acid , Substrate Specificity
13.
Bioinformatics ; 36(Suppl_2): i718-i725, 2020 12 30.
Article in English | MEDLINE | ID: mdl-33381814

ABSTRACT

MOTIVATION: As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well. RESULTS: We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family. AVAILABILITY AND IMPLEMENTATION: Python code available at https://git.wur.nl/durai001/geometricus.


Subject(s)
Algorithms , Proteins , Cluster Analysis , Machine Learning
14.
Arch Biochem Biophys ; 695: 108647, 2020 11 30.
Article in English | MEDLINE | ID: mdl-33121934

ABSTRACT

Plant terpene synthases (TPSs) can mediate formation of a large variety of terpenes, and their diversification contributes to the specific chemical profiles of different plant species and chemotypes. Plant genomes often encode a number of related terpene synthases, which can produce very different terpenes. The relationship between TPS sequence and resulting terpene product is not completely understood. In this work we describe two TPSs from the Camphor tree Cinnamomum camphora (L.) Presl. One of these, CiCaMS, acts as a monoterpene synthase (monoTPS), and mediates the production of myrcene, while the other, CiCaSSy, acts as a sesquiterpene synthase (sesquiTPS), and catalyses the production of α-santalene, ß-santalene and trans-α-bergamotene. Interestingly, these enzymes share 97% DNA sequence identity and differ only in 22 amino acid residues out of 553. To understand which residues are essential for the catalysis of monoterpenes resp. sesquiterpenes, a number of hybrid synthases were prepared, and supplemented by a set of single-residue variants. These were tested for their ability to produce monoterpenes and sesquiterpenes by in vivo production of sesquiterpenes in E. coli, and by in vitro enzyme assays. This analysis pinpointed three residues in the sequence which could mediate the change in product specificity from a monoterpene synthase to a sesquiterpene synthase. Another set of three residues defined the sesquiterpene product profile, including the ratios between sesquiterpene products.


Subject(s)
Alkyl and Aryl Transferases/chemistry , Cinnamomum camphora/enzymology , Monoterpenes/chemistry , Plant Proteins/chemistry , Sesquiterpenes/chemistry , Alkyl and Aryl Transferases/genetics , Alkyl and Aryl Transferases/metabolism , Cinnamomum camphora/genetics , Escherichia coli/genetics , Escherichia coli/metabolism , Monoterpenes/metabolism , Plant Proteins/genetics , Plant Proteins/metabolism , Recombinant Proteins/chemistry , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Sesquiterpenes/metabolism
15.
Comput Struct Biotechnol J ; 18: 981-992, 2020.
Article in English | MEDLINE | ID: mdl-32368333

ABSTRACT

The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta's performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.

16.
Phytochemistry ; 158: 157-165, 2019 Feb.
Article in English | MEDLINE | ID: mdl-30446165

ABSTRACT

Plants exhibit a vast array of sesquiterpenes, C15 hydrocarbons which often function as herbivore-repellents or pollinator-attractants. These in turn are produced by a diverse range of sesquiterpene synthases. A comprehensive analysis of these enzymes in terms of product specificity has been hampered by the lack of a centralized resource of sufficient functionally annotated sequence data. To address this, we have gathered 262 plant sesquiterpene synthase sequences with experimentally characterized products. The annotated enzyme sequences allowed for an analysis of terpene synthase motifs, leading to the extension of one motif and recognition of a variant of another. In addition, putative terpene synthase sequences were obtained from various resources and compared with the annotated sesquiterpene synthases. This analysis indicated regions of terpene synthase sequence space which so far are unexplored experimentally. Finally, we present a case describing mutational studies on residues altering product specificity, for which we analyzed conservation in our database. This demonstrates an application of our database in choosing likely-functional residues for mutagenesis studies aimed at understanding or changing sesquiterpene synthase product specificity.


Subject(s)
Alkyl and Aryl Transferases/chemistry , Alkyl and Aryl Transferases/metabolism , Plant Proteins/metabolism , Sesquiterpenes/metabolism , Alkyl and Aryl Transferases/genetics , Amino Acid Motifs , Amino Acid Sequence , Conserved Sequence , Databases, Protein , Phylogeny , Plant Proteins/genetics , Sesquiterpenes/chemistry , Substrate Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...